Project Description

Investigate user behavior for a food company's app.

  1. Study the sales funnel. Find out how users reach the purchase stage. How many users actually make it to this stage? How many get stuck at previous stages? Which stages in particular?
  2. Look at the results of an A/A/B test to decide wether to change the fonts for the entire app.

The users are split into three groups: two control groups get the old fonts and one test group gets the new ones. The goal: Find out which set of fonts produces better results. Creating two A groups has certain advantages. We can make it a principle that we will only be confident in the accuracy of our testing when the two control groups are similar. If there are significant differences between the A groups, this can help us uncover factors that may be distorting the results. Comparing control groups also tells us how much time and data we'll need when running further tests.

Table of Contents

1. Open the data file and read the general information

2. Prepare the data for analysis

It seems that the duplucates are in all 5 events, in 237 users, 352 dates and in all 3 groups. In adittion, it seems that the problem was in the whole week (30/7 - 7/8). This means that the data us very corrupted.

3. Study and check the data

How many events are in the logs?

There are 5 events:

  1. MainScreenAppear
  2. PaymentScreenSuccessful
  3. CartScreenAppear
  4. OffersScreenAppear
  5. Tutorial

How many users are in the logs?

There are 7,551 users in the log

What's the average number of events per user?

On average, there are 32 event per user

From the table above it seems that only 2707 out of 7,551 users completed the 1st event. In adittion, less than half of the users (only 3035 users of the 7,551 users) did a full circle and completed the 4th event.

What period of time does the data cover? Find the maximum and the minimum date.

The maximum date is: 2019-08-08 The Minimum date is : 2019-07-25

The data covers 14 days

Plot a histogram by date and time. Can you be sure that you have equally complete data for the entire period? Older events could end up in some users' logs for technical reasons, and this could skew the overall picture. Find the moment at which the data starts to be complete and ignore the earlier section. What period does the data actually represent?

The period of time that the data covers is from: 2019-07-25 to 2019-08-07. From the figure above, it can be seen that there was a peak at August 1st 2019, which can indicate that the complete data started from this date and not for the entire period.

A posible reason could be that the first week was the week of the recrutment and there was some technical issues with the recrut of the server

Did you lose many events and users when excluding the older data?

Make sure you have users from all three experimental groups.

When excluding older data than August 1st 2019, we have lost 23% of the users (17 users) and 1.5% of the events (2610 events). However, we still have users from all three experimental groups.

4. Study the event funnel

See what events are in the logs and their frequency of occurrence. Sort them by frequency.

The most frequent event is the main screen with 117,431 events. The offers screens has 46,350 evets. The cart screen has 42,365 events. The payment screen has 34,113 events. The tutorial is the less frequent event with 1039 events.

Find the number of users who performed each of these actions. Sort the events by the number of users.

98% of all users saw the main screen, 61% of all users made the Offers Screen , 49% the cart screen. 47% the payment screend and only 11% did the tutorial.

Calculate the proportion of users who performed the action at least once

96% out of useres who made the main screen for the first time, made it once again. This is very high retention rate. for the offer screen, ouy of 61% users who made it, 85% made it once again. for the cart screen, ouy of 47% users who made it, 84% made it once again. The retention rate of the pyment - out of 47% users who made a payment, 83% made it once again.

In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.

The order in which the actions took place is: Main Screen -> Offer Screen -> Cart Screen -> Payment Successful Screen

not all of the actions are part of a single sequence: for example, it might be possible to make a purchase without viewing the cart screen or make a purchase without seeing the offer screen . In adition you don't have to go over the tutorial.

Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)

The groups are more or less splited equally.

At what stage do you lose the most users?

From the funnel above, it can be seen that the stage we lose most of the users is the from the main screen to the offer screen (38% decrease in users who processed to the offer screen)

What share of users make the entire journey from their first event to payment?

As can be seen from the table above, the share of users that make the entire journey from the first event (main screen)to the payment screen is 47%

5. Study the results of the experiment

How many users are there in each group?

In the first test group (246) there are 2484 users In the second test group (247) there are 2513 users In the third control group (248) there are 2537 users.

The users ampunt in each group is similar.

We have two control groups in the A/A test, where we check our mechanisms and calculations.

See if there is a statistically significant difference between samples 246 and 247.

in order to check if there is a statistically significant difference between samples 246 and 247 we need to check the proportions - the share of users from all users in the test that had an event compared to the other test (conversions)

We want to test the statistical significance of the difference in conversion between control groups 246 and 247.

The Null Hypothesis H0: There is no statistically significant difference in conversion between control groups 246 and 247.
The Alternative Hypothesis H1: There is a statistically significant difference in conversion between control groups 246 and 247.

For all the events, the p_value is greater than the alpha level of 0.05 which means that we cannot reject the null hypothesis and we determine that there is no statistically significant difference between the two control groups for each event.

We Can you confirm that the groups were split properly since there isn't significant difference in the results.

Do the same thing for the group with altered fonts. Compare the results with those of each of the control groups for each event in isolation. Compare the results with the combined results for the control groups. What conclusions can you draw from the experiment?

We want to test the statistical significance of the difference in conversion between groups 246 and 248.

The Null Hypothesis H0: There is no statistically significant difference in conversion between control groups 246 and 248. The Alternative Hypothesis H1: There is a statistically significant difference in conversion between control groups 246 and 248.

For all the events, the p_value is greater than the alpha level of 0.05 which means that we cannot reject the null hypothesis and we determine that there is no statistically significant difference between the two control groups for each event.

We want to test the statistical significance of the difference in conversion between control groups 247 and 248.

The Null Hypothesis H0: There is no statistically significant difference in conversion between control groups 247 and 248. The Alternative Hypothesis H1: There is a statistically significant difference in conversion between control groups 247 and 248.

For all the events, the p_value is greater than the alpha level of 0.05 which means that we cannot reject the null hypothesis and we determine that there is no statistically significant difference between the two control groups for each event.

Compare the results with the combined results for the control groups. What conclusions can you draw from the experiment?

We want to test the statistical significance of the difference in conversion between the combined results of the control groups (246+247) and 248.

The Null Hypothesis H0: There is no statistically significant difference in conversion between control groups 246+247 and 248. The Alternative Hypothesis H1: There is a statistically significant difference in conversion between control groups 246+247 and 248.

For all the events, the p_value is greater than the alpha level of 0.05 which means that we cannot reject the null hypothesis and we determine that there is no statistically significant difference between the two control groups for each event.

Calculate how many statistical hypothesis tests you carried out .

What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.

Having multipal test on the same sample increases the chanses to have a type 1 error - rejecting the null hypotesis where we should'nt (the results show that the proportions are different but they are not). In a pair-wise comparison, the probability that the test will yield a false positive result is equal to the significance level. FWER is the probability of obtaining at least one result of the kind. In oreder to minimize FWER, we can apply the Bonferroni procedure (the Bonferroni correction) to correct the significance level.

Previously, I set the alpha significance level to 0.05.

Since we had multiple tests, I Used the Bonferroni correction in order to correct the significance level and set the alpha significance level to be 0.01 . Using the Bonferroni alpha, I got the same results- in all groups all events we cannot reject the null hypothesis and we determine that there is no statistically significant difference between the groups for each event.

When using the Bonferroni correction we actually decrised the alpha and increase the chances for type 2 error (stop rejecting when we should). Therefor, Bonferroni correction decreases the power of test and since we got the same results we should prefer using the original alpha.

6.Conclusion

In this project I investigated user behavior for the company's app. From studing the sales funnel it was seen the the most frequent event is the main screen with 117,431 events. 98% of all users saw the main screen but only 61% of all users made the Offers Screen , 49% the cart screen. 47% the payment screend and only 11% did the tutorial. The stage we lose most of the users is the from the main screen to the offer screen (38% decrease in users who processed to the offer screen). In addition, the share of users that make the entire journey from the main screen to the payment screen is 47%

From the results of an A/A/B test, it can be concluded that in all groups there was no statistically significant difference between the groups for each event.Therfore, changing the fonts for the entire app didn't produce better results than the old one and but it also the isn't intimidating.